從專用型人工智慧到通用大型語言模型的演進

人工智慧的范式轉變

1. 從專門到通用

人工智慧領域在模型訓練與部署方式上經歷了巨大的轉變。

舊有模式（任務特定訓練）：早期的卷積神經網絡（CNN）或 BERT 等模型僅針對單一目標進行訓練（例如，僅限情感分析）。要實現翻譯、摘要等功能，則需使用不同的模型。
新範式（集中式預訓練 + 提示）：一個龐大的模型（大語言模型，LLM）從互聯網規模的數據集中學習通用世界知識。只需改變輸入提示，即可引導其完成幾乎任何語言任務。

2. 模型架構的演進

僅編碼器（BERT 時代）：專注於理解與分類。這些模型以雙向方式讀取文本，以掌握深度語境，但並非設計用於生成新內容。
僅解碼器（GPT／Llama 時代）：當今生成式人工智慧的標準架構。這些模型採用自回歸建模來預測下一個詞彙，非常適合開放式內容生成與對話應用。

3. 推動變革的核心因素

自監督學習：利用海量未標註的互聯網數據進行訓練，突破了人工標註的瓶頸。
擴展定律：經驗觀察顯示，人工智慧的表現可隨著模型規模（參數量）、數據量與運算能力的提升而穩定增長。

關鍵洞察

人工智慧已從「任務專用工具」演進為具備推理與上下文學習等顯現能力的「通用型代理」。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Question 1

What is the primary difference between the "Old Paradigm" and the "New Paradigm" of AI?

Moving from cloud computing to local processing.

Moving from task-specific training to centralized pre-training with prompting.

Moving from Python to C++ for model development.

Moving from Decoder-only to Encoder-only architectures.

Question 2

According to Scaling Laws, what three factors fundamentally link to model performance?

Internet speed, RAM size, and CPU cores.

Human annotators, code efficiency, and server location.

Model size (parameters), data volume (tokens), and total computation.

Prompt length, temperature setting, and top-k value.

Challenge: Evaluating Architectural Fitness

Apply your knowledge of model architectures to real-world scenarios.

You are an AI architect tasked with selecting the right foundational approach for two different projects. You must choose between an Encoder-only (like BERT) or a Decoder-only (like GPT) architecture.

Task 1

You are building a system that only needs to classify incoming emails as "Spam" or "Not Spam" based on the entire context of the message. Which architecture is more efficient for this narrow task?

Solution: Encoder-only (e.g., BERT)

Because the task is classification and requires deep, bidirectional understanding of the text without needing to generate new text, an Encoder-only model is highly efficient and appropriate.

Task 2

You are building a creative writing assistant that helps authors brainstorm ideas and write the next paragraph of their story. Which architecture is the modern standard for this?

Solution: Decoder-only (e.g., GPT/Llama)

This task requires open-ended text generation. Decoder-only models are designed specifically for auto-regressive next-token prediction, making them the standard for generative AI applications.